Skip to content

Conversation

@hye-on
Copy link
Contributor

@hye-on hye-on commented Mar 24, 2025

This PR fixes issue #104160 where a duplicate key exception occurs in GetDatafeedRunningStateAction.Response.fromResponses(). The issue happens when a datafeed is force-stopped and restarted before its local task cancellation completes. This creates a situation where two local tasks for the same datafeed temporarily coexist on the ML node (one cancelling, one starting), causing the duplicate key error when both report their state.
The solution implements a merge function in the toMap collector that selects the most appropriate state when duplicates are found, based on the searchInterval data.

The solution implements a merge function in the toMap collector that selects the most appropriate state when duplicates are found, based on the searchInterval data.
Select the most appropriate state based on:

  1. Prefer state with more recent searchInterval.startMs when both exist
  2. Prefer states with searchInterval over those without
  3. Default to second state when all criteria are equal

Comment

I'm new to Elasticsearch and open source contributions in general. I went with searchInterval.startMs as the selection criteria, but I'd appreciate any feedback on whether there might be a better approach for handling these duplicate states. Thank you for your guidance! :)

Fixes #104160

Implement merge function for duplicate datafeed states when a datafeed is force-stopped and restarted before cancellation completes. Select the most appropriate state based on:
1. Prefer state with more recent searchInterval.startMs when both exist
2. Prefer states with searchInterval over those without
3. Default to second state when all criteria are equal
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Mar 24, 2025
@hye-on hye-on changed the title Resolve duplicate key exception in GetDatafeedRunningStateAction [ML] Resolve duplicate key exception in GetDatafeedRunningStateAction Mar 26, 2025
@AI-IshanBhatt AI-IshanBhatt added the :Search Relevance/Search Catch all for Search Relevance label Mar 26, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Mar 26, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Mar 26, 2025
@valeriy42 valeriy42 added :ml Machine learning Team:ML Meta label for the ML team >bug labels Mar 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@davidkyle davidkyle removed Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :Search Relevance/Search Catch all for Search Relevance labels Mar 31, 2025
@davidkyle davidkyle self-assigned this Mar 31, 2025
@davidkyle
Copy link
Member

Thanks for the contribution @hye-on and welcome to Elasticsearch

Prefer state with more recent searchInterval.startMs when both exist
Prefer states with searchInterval over those without
Default to second state when all criteria are equal

The logic you've applied makes perfect sense to me, this is a practical solution given that searchInterval may be null in some situations.

Please can you add a unit test to cover the logic in the Response::selectMostRecentState function. Create a. new test in GetDatafeedRunningStateActionResponseTests

@davidkyle
Copy link
Member

@elasticmachine test this please

@hye-on
Copy link
Contributor Author

hye-on commented Mar 31, 2025

@davidkyle I’ll add the test! I’ll reach out if I need any help. Thank you for the review! :)

@hye-on
Copy link
Contributor Author

hye-on commented Mar 31, 2025

@davidkyle I’ve added the tests! Thank you!

@davidkyle
Copy link
Member

@elasticmachine test this please

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the contribution

@davidkyle davidkyle merged commit 89adec1 into elastic:main Apr 1, 2025
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug external-contributor Pull request authored by a developer outside the Elasticsearch team :ml Machine learning Team:ML Meta label for the ML team v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ML] GetDatafeedRunningStateAction can fail if multiple local datafeed tasks exist

5 participants